强化学习的主要困难之一是从{\ em dobsolicy}样本中学习,这些样本是由算法评估(目标策略)的不同策略(行为策略)收集的。非政策学习需要从行为政策中纠正样本的分布到目标策略的分布。不幸的是,重要的抽样具有固有的高方差问题,从而导致策略梯度方法的梯度估计差。我们专注于范围的参与者 - 批评体系结构,并提出了一种称为预处理近端政策优化(P3O)的新方法,该方法可以通过将预处理程序应用于保守政策迭代(CPI)目标来控制重要性采样的较高差异。 {\ em此预处理以一种特殊的方式使用Sigmoid函数,即当没有策略更改时,梯度是最大的,因此策略梯度将驱动大参数更新以有效地探索参数空间}。这是一种新颖的探索方法,鉴于现有的探索方法是基于国家和行动的新颖性,尚未对其进行研究。我们与离散和连续任务上的几种表现最好的算法进行了比较,结果表明{\ em ppo不足以实现异位},并且我们的p3O比ppo {\ em off-policy}比ppo比“根据off off ppo”。 - 通过Deon Metric衡量的Policyness,P3O在比PPO更大的政策空间中探索。结果还表明,在训练过程中,我们的P3O比PPO更好地提高了CPI目标。
translated by 谷歌翻译
Molecular representation learning is crucial for the problem of molecular property prediction, where graph neural networks (GNNs) serve as an effective solution due to their structure modeling capabilities. Since labeled data is often scarce and expensive to obtain, it is a great challenge for GNNs to generalize in the extensive molecular space. Recently, the training paradigm of "pre-train, fine-tune" has been leveraged to improve the generalization capabilities of GNNs. It uses self-supervised information to pre-train the GNN, and then performs fine-tuning to optimize the downstream task with just a few labels. However, pre-training does not always yield statistically significant improvement, especially for self-supervised learning with random structural masking. In fact, the molecular structure is characterized by motif subgraphs, which are frequently occurring and influence molecular properties. To leverage the task-related motifs, we propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT). MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt. The prompt effectively augments the molecular graph with meaningful motifs in the continuous representation space; this provides more structural patterns to aid the downstream classifier in identifying molecular properties. Extensive experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction, with or without a few fine-tuning steps.
translated by 谷歌翻译
Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks by replacing the expensive multiplications in convolution with cheaper additions (i.e.l1-norm). To achieve higher hardware efficiency, it is necessary to further study the low-bit quantization of AdderNet. Due to the limitation that the commutative law in multiplication does not hold in l1-norm, the well-established quantization methods on convolutional networks cannot be applied on AdderNets. Thus, the existing AdderNet quantization techniques propose to use only one shared scale to quantize both the weights and activations simultaneously. Admittedly, such an approach can keep the commutative law in the l1-norm quantization process, while the accuracy drop after low-bit quantization cannot be ignored. To this end, we first thoroughly analyze the difference on distributions of weights and activations in AdderNet and then propose a new quantization algorithm by redistributing the weights and the activations. Specifically, the pre-trained full-precision weights in different kernels are clustered into different groups, then the intra-group sharing and inter-group independent scales can be adopted. To further compensate the accuracy drop caused by the distribution difference, we then develop a lossless range clamp scheme for weights and a simple yet effective outliers clamp strategy for activations. Thus, the functionality of full-precision weights and the representation ability of full-precision activations can be fully preserved. The effectiveness of the proposed quantization method for AdderNet is well verified on several benchmarks, e.g., our 4-bit post-training quantized adder ResNet-18 achieves an 66.5% top-1 accuracy on the ImageNet with comparable energy efficiency, which is about 8.5% higher than that of the previous AdderNet quantization methods.
translated by 谷歌翻译
We discuss two kinds of semantics relevant to Computer Vision (CV) systems - Visual Semantics and Lexical Semantics. While visual semantics focus on how humans build concepts when using vision to perceive a target reality, lexical semantics focus on how humans build concepts of the same target reality through the use of language. The lack of coincidence between visual and lexical semantics, in turn, has a major impact on CV systems in the form of the Semantic Gap Problem (SGP). The paper, while extensively exemplifying the lack of coincidence as above, introduces a general, domain-agnostic methodology to enforce alignment between visual and lexical semantics.
translated by 谷歌翻译
In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining. Our goal is to provide an easy-to-use BERT pretraining toolkit for the research community and industry. Thus, the pretraining of popular language models on customized datasets is affordable with limited resources. Experiments show that, to achieve the same or better GLUE scores, the time cost of our toolkit is over $6\times$ times less for BERT Base and $9\times$ times less for BERT Large when compared with the original BERT paper. The documentation and code are released at https://github.com/extreme-bert/extreme-bert under the Apache-2.0 license.
translated by 谷歌翻译
机载激光扫描(ALS)点云的分类是遥感和摄影测量场的关键任务。尽管最近基于深度学习的方法取得了令人满意的表现,但他们忽略了接受场的统一性,这使得ALS点云分类对于区分具有复杂结构和极端规模变化的区域仍然具有挑战性。在本文中,为了配置多受感受性的场特征,我们提出了一个新型的接受场融合和分层网络(RFFS-NET)。以新颖的扩张图卷积(DGCONV)及其扩展环形扩张卷积(ADCONV)作为基本的构建块,使用扩张和环形图融合(Dagfusion)模块实现了接受场融合过程,该模块获得了多受感染的场特征代表通过捕获带有各种接收区域的扩张和环形图。随着计算碱基的计算基础,使用嵌套在RFFS-NET中的多级解码器进行的接收场的分层,并由多层接受场聚集损失(MRFALOSS)驱动,以驱动网络驱动网络以学习在具有不同分辨率的监督标签的方向。通过接受场融合和分层,RFFS-NET更适应大型ALS点云中具有复杂结构和极端尺度变化区域的分类。在ISPRS Vaihingen 3D数据集上进行了评估,我们的RFFS-NET显着优于MF1的基线方法5.3%,而MIOU的基线方法的总体准确性为82.1%,MF1的总准确度为71.6%,MIOU的MF1和MIOU为58.2%。此外,LASDU数据集和2019 IEEE-GRSS数据融合竞赛数据集的实验显示,RFFS-NET可以实现新的最新分类性能。
translated by 谷歌翻译
降解的图像通常存在于字符图像的一般来源中,从而导致特征识别结果不令人满意。现有的方法有专门的努力来恢复降级的角色图像。但是,这些方法获得的降解结果似乎并不能提高字符识别性能。这主要是因为当前方法仅着眼于像素级信息,而忽略了角色的关键特征,例如其字形,从而在脱索过程中导致字符标志性损害。在本文中,我们介绍了一个基于字形融合和注意力机制(即Churformer)的新型通用框架,以精确地恢复角色图像而不改变其固有的字形。与现有的框架不同,Charformer引入了一个并行目标任务,用于捕获其他信息并将其注入DICONISE骨架的图像,这将在字符图像DeNoising期间保持角色字形的一致性。此外,我们利用基于注意力的网络进行全局本地特征交互,这将有助于处理盲目的denoising和增强deNoSising绩效。我们将Charformer与多个数据集上的最新方法进行比较。实验结果表明了杂形和质量上的优势。
translated by 谷歌翻译
构建高质量的角色图像数据集很具有挑战性,因为现实世界图像通常受图像退化的影响。将当前图像恢复方法应用于此类现实世界字符图像时存在局限性,因为(i)字符图像中的噪声类别与一般图像中的噪声类别不同; (ii)现实世界字符图像通常包含更复杂的图像降解,例如不同噪声水平的混合噪声。为了解决这些问题,我们提出了一个现实世界角色恢复网络(RCRN),以有效恢复降级的角色图像,其中使用字符骨架信息和比例安装特征提取来获得更好的恢复性能。所提出的方法由骨架提取器(SENET)和角色图像修复器(CIRNET)组成。 Senet旨在保持角色的结构一致性并使复杂的噪声正常化。然后,Cirnet从降级的角色图像及其骨骼中重建了清洁图像。由于缺乏现实世界字符图像恢复的基准,我们构建了一个包含1,606个字符图像的数据集,这些图像具有现实世界中的降级,以评估所提出方法的有效性。实验结果表明,RCRN在定量和质量上优于最先进的方法。
translated by 谷歌翻译
长尾效应是一个常见的问题,它限制了对现实世界数据集中深度学习模型的性能。由于字符使用频率差异,角色图像数据集的开发还受到这种不平衡数据分布的影响。因此,当当前的角色识别方法应用于现实世界数据集时,尤其是尾巴中缺少训练样本的字符类别,例如不常见的字符或历史文档中的字符。在本文中,我们通过自由基提取(即REZCR)提出一个零摄像的角色识别框架,以提高几个样本字符类别的识别性能,在其中我们通过分解和分解和分解和分解和分解和分解字符的图形单位来利用有关的信息重建拼字法之后的字符。 REZCR由基于注意力的激进信息提取器(RIE)和基于知识图的角色推理器(KGR)组成。 RIE的目的是认识到候选激进分子及其从角色图像中可能的结构关系。结果将被馈入KGR,以通过使用预设计的字符知识图来识别目标字符。我们在多个数据集上验证我们的方法,REZCR显示出有希望的实验结果,尤其是对于少数样本字符数据集。
translated by 谷歌翻译
目前,神经网络模型的量化方法主要分为训练后量化(PTQ)和量化意识培训(QAT)。培训后量化仅需要一小部分数据即可完成量化过程,但是其定量模型的性能不如量化意识培训。本文提出了一种新颖的量化方法,称为注意弹。该方法给出了参数w有机会映射到所有可能的量化值,而不仅仅是在量化过程中w附近的两个量化值。被映射到不同量化值的概率与量化值和W之间的距离负相关,并随高斯函数衰减。此外,本文使用有损耗的编码长度作为衡量标准,将位宽度分配给模型的不同层以解决混合精度量化的问题,从而有效避免了解决组合优化问题。本文还对不同模型进行了定量实验,结果证实了该方法的有效性。对于RESNET18和MOBILENETV2,本文提出的后培训量化仅需要1,024个培训数据和10分钟即可完成量化过程,这可以在量化意识培训的情况下实现量化性能。
translated by 谷歌翻译